jl transform
Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry
The Johnson-Lindenstrauss (JL) lemma is a cornerstone of dimensionality reduction in Euclidean space, but its applicability to non-Euclidean data has remained limited. This paper extends the JL lemma beyond Euclidean geometry to handle general dissimilarity matrices that are prevalent in real-world applications. We present two complementary approaches: First, we show the JL transform can be applied to vectors in pseudo-Euclidean space with signature (p,q), providing theoretical guarantees that depend on the ratio of the (p,q)norm and Euclidean norm of two vectors, measuring the deviation from Euclidean geometry. Second, we prove that any symmetric hollow dissimilarity matrix can be represented as a matrix of generalized power distances, with an additional parameter representing the uncertainty level within the data. In this representation, applying the JL transform yields multiplicative approximation with a controlled additive error term proportional to the deviation from Euclidean geometry. Our theoretical results provide fine-grained performance analysis based on the degree to which the input data deviates from Euclidean geometry, making practical and meaningful reduction in dimensionality accessible to a wider class of data.
Reviews: Learning convex polytopes with margin
The paper considers the problem of PAC learning of fat convex polytopes in the realizable case. This hypothesis class is given by intersections of t fat hyperplanes, i.e., hyperplanes with margin gamma. Using standard results, the authors derive that the VC dimension of this class is quadratic in the inverse margin size and thus that the sample complexity is polylogerithmic in this quantity. As their main result, they provide two algorithms for finding with high probability a consistent fat polytope: one with exponential runtime in t and one polynomial greedy algorithm that, however, only guarantees to find a (t log n)-polytope. Complementary, the paper states two hardness of approximation results: one for finding an approximately consistent fat hyperplane, i.e., one with the minimum number of negative points on wrong side (and all positive correctly classified), and one for finding a consistent fat polytope with the minimum number of hyperplanes. Finally, the authors also show how their results relate to the alternative class of polytopes that separate points outside of their gamma-envelope (area with Euclidean distance less or equal to gamma from the polytope boundary).
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead
Zandieh, Amir, Daliri, Majid, Han, Insu
Serving LLMs requires substantial memory due to the storage requirements of Key-Value (KV) embeddings in the KV cache, which grows with sequence length. An effective approach to compress KV cache is quantization. However, traditional quantization methods face significant memory overhead due to the need to store quantization constants (at least a zero point and a scale) in full precision per data block. Depending on the block size, this overhead can add 1 or 2 bits per quantized number. We introduce QJL, a new quantization approach that consists of a Johnson-Lindenstrauss (JL) transform followed by sign-bit quantization. In contrast to existing methods, QJL eliminates memory overheads by removing the need for storing quantization constants. We propose an asymmetric estimator for the inner product of two vectors and demonstrate that applying QJL to one vector and a standard JL transform without quantization to the other provides an unbiased estimator with minimal distortion. We have developed an efficient implementation of the QJL sketch and its corresponding inner product estimator, incorporating a lightweight CUDA kernel for optimized computation. When applied across various LLMs and NLP tasks to quantize the KV cache to only 3 bits, QJL demonstrates a more than fivefold reduction in KV cache memory usage without compromising accuracy, all while achieving faster runtime. Codes are available at \url{https://github.com/amirzandieh/QJL}.
Learning convex polytopes with margin
Gottlieb, Lee-Ad, Kaufman, Eran, Kontorovich, Aryeh, Nivasch, Gabriel
We present an improved algorithm for properly learning convex polytopes in the realizable PAC setting from data with a margin. Our learning algorithm constructs a consistent polytope as an intersection of about t log t halfspaces with margins in time polynomial in t (where t is the number of halfspaces forming an optimal polytope). We also identify distinct generalizations of the notion of margin from hyperplanes to polytopes and investigate how they relate geometrically; this result may be of interest beyond the learning setting.
Learning convex polytopes with margin
Gottlieb, Lee-Ad, Kaufman, Eran, Kontorovich, Aryeh, Nivasch, Gabriel
We present an improved algorithm for properly learning convex polytopes in the realizable PAC setting from data with a margin. Our learning algorithm constructs a consistent polytope as an intersection of about t log t halfspaces with margins in time polynomial in t (where t is the number of halfspaces forming an optimal polytope). We also identify distinct generalizations of the notion of margin from hyperplanes to polytopes and investigate how they relate geometrically; this result may be of interest beyond the learning setting.
Learning convex polytopes with margin
Gottlieb, Lee-Ad, Kaufman, Eran, Kontorovich, Aryeh, Nivasch, Gabriel
We present a near-optimal algorithm for properly learning convex polytopes in the realizable PAC setting from data with a margin. Our first contribution is to identify distinct generalizations of the notion of {\em margin} from hyperplanes to polytopes and to understand how they relate geometrically; this result may be of interest beyond the learning setting. Our novel learning algorithm constructs a consistent polytope as an intersection of about $t \log t$ halfspaces in time polynomial in $t$ (where $t$ is the number of halfspaces forming an optimal polytope). This is an exponential improvement over the state of the art [Arriaga and Vempala, 2006]. We also improve over the super-polynomial-in-$t$ algorithm of Klivans and Servedio [2008], while achieving a better sample complexity. Finally, we provide the first nearly matching hardness-of-approximation lower bound, whence our claim of near optimality.
Fast Sparse Least-Squares Regression with Non-Asymptotic Guarantees
Yang, Tianbao, Zhang, Lijun, Lin, Qihang, Jin, Rong
In this paper, we study a fast approximation method for {\it large-scale high-dimensional} sparse least-squares regression problem by exploiting the Johnson-Lindenstrauss (JL) transforms, which embed a set of high-dimensional vectors into a low-dimensional space. In particular, we propose to apply the JL transforms to the data matrix and the target vector and then to solve a sparse least-squares problem on the compressed data with a {\it slightly larger regularization parameter}. Theoretically, we establish the optimization error bound of the learned model for two different sparsity-inducing regularizers, i.e., the elastic net and the $\ell_1$ norm. Compared with previous relevant work, our analysis is {\it non-asymptotic and exhibits more insights} on the bound, the sample complexity and the regularization. As an illustration, we also provide an error bound of the {\it Dantzig selector} under JL transforms.